1. Introduction and Goals¶
This project analyses the LendingClub's loan dataset. The dataset includes loan application data, the information on granted loans and their status.
Goals¶
Build models for:
- Classifying
AcceptedandRejectedloans - Estimating loan default risk (
loan_status)- Can be used by potential investors into the platform to select a sample of loans to invest in (instead of provided grades/subgrade/etc)
- Classifying loan
gradeandsub_grade
Classifying loan grade and sub_grade¶
The goal of this project is build several ML models which can be used to predict:
- Binary
loan_statusprediction model. We have usedXGBoostandCatBoostto build a model which can be used to predict how likely an applicant is to default on a loan based on their information and the loan size, type and other specifics. This model can be used to decide whether the application should be approved or used by a third party investor to estimate risk. The goal of this model is to maximize the quality of probability prediction rather than the classification accuracy itself. In addition to estimating the default likelihood we'll also use this model to estimate:- Loan grades (A,B,C etc.) based on probability threshold, this does not directly attempt to predict the grades defined in the dataset but rather can be used as an alternative method to classify loan risk
- Using this model we'll calculate the profitability and risk of a loan portfolio if only and provide an improved (more restrictive) method of evaluating loan applicants
loan_gradeprediction models, we'll evaluate and compare two different approaches- Multi classification model using
XGBoostandCatBoost - Regression model that attempts to predict grade converted to a normalized numerical variable (predictions are rounded). While using a regression model for a classification problem is not ideal because it ignores the fact that spacing between different classes of ordinal features might differ (e.g. we don't know if the distance between grade A and B is the same as between B and C etc.)
- Multi classification model using
Practical Applications¶
The default risk model can be used by potential investors to figure out which loans offered by the Lenders Club to invest in.
- e.g. a non experienced individual would likely base their decision by just looking into the grade, interest rate and maybe a few other variables and make their decision based on this.
- Our default risk model allows them to build an investment strategy based on their risk profile. Basically our goal is to provide a self-contained model that can be used (instead of just investing based on the grade/interest rate) to maximize returns based on acceptable risk.